124 research outputs found

    The XBabelPhish MAGE-ML and XML Translator

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MAGE-ML has been promoted as a standard format for describing microarray experiments and the data they produce. Two characteristics of the MAGE-ML format compromise its use as a universal standard: First, MAGE-ML files are exceptionally large – too large to be easily read by most people, and often too large to be read by most software programs. Second, the MAGE-ML standard permits many ways of representing the same information. As a result, different producers of MAGE-ML create different documents describing the same experiment and its data. Recognizing all the variants is an unwieldy software engineering task, resulting in software packages that can read and process MAGE-ML from some, but not all producers. This Tower of MAGE-ML Babel bars the unencumbered exchange of microarray experiment descriptions couched in MAGE-ML.</p> <p>Results</p> <p>We have developed XBabelPhish – an XQuery-based technology for translating one MAGE-ML variant into another. XBabelPhish's use is not restricted to translating MAGE-ML documents. It can transform XML files independent of their DTD, XML schema, or semantic content. Moreover, it is designed to work on very large (> 200 Mb.) files, which are common in the world of MAGE-ML.</p> <p>Conclusion</p> <p>XBabelPhish provides a way to inter-translate MAGE-ML variants for improved interchange of microarray experiment information. More generally, it can be used to transform most XML files, including very large ones that exceed the capacity of most XML tools.</p

    Engineering Design with Digital Thread

    Get PDF
    Digital Thread offers the opportunity to use information generated across the product lifecycle to design the next generation of products. In this paper, we introduce a mathematical methodology that establishes the data-driven design and decision problem associated with Digital Thread. Our objectives are twofold: 1) Provide a mathematical definition of Digital Thread in the context of conceptual and preliminary design and establish a methodology for how information along the Digital Thread enters into the design problem as well how design decisions affect the Digital Thread. 2) Develop a data-driven design method that incorporates data from different sources from across the product life cycle. We illustrate aspects of our methodology through an example design of a structural fiber-steered composite component.United States. Air Force. Office of Scientific Research (Grant FA9550-16-1-0108)SUTD-MIT International Design Centre (IDC

    Implementation of GenePattern within the Stanford Microarray Database

    Get PDF
    Hundreds of researchers across the world use the Stanford Microarray Database (SMD; http://smd.stanford.edu/) to store, annotate, view, analyze and share microarray data. In addition to providing registered users at Stanford access to their own data, SMD also provides access to public data, and tools with which to analyze those data, to any public user anywhere in the world. Previously, the addition of new microarray data analysis tools to SMD has been limited by available engineering resources, and in addition, the existing suite of tools did not provide a simple way to design, execute and share analysis pipelines, or to document such pipelines for the purposes of publication. To address this, we have incorporated the GenePattern software package directly into SMD, providing access to many new analysis tools, as well as a plug-in architecture that allows users to directly integrate and share additional tools through SMD. In this article, we describe our implementation of the GenePattern microarray analysis software package into the SMD code base. This extension is available with the SMD source code that is fully and freely available to others under an Open Source license, enabling other groups to create a local installation of SMD with an enriched data analysis capability

    Efficient gene-driven germ-line point mutagenesis of C57BL/6J mice

    Get PDF
    BACKGROUND: Analysis of an allelic series of point mutations in a gene, generated by N-ethyl-N-nitrosourea (ENU) mutagenesis, is a valuable method for discovering the full scope of its biological function. Here we present an efficient gene-driven approach for identifying ENU-induced point mutations in any gene in C57BL/6J mice. The advantage of such an approach is that it allows one to select any gene of interest in the mouse genome and to go directly from DNA sequence to mutant mice. RESULTS: We produced the Cryopreserved Mutant Mouse Bank (CMMB), which is an archive of DNA, cDNA, tissues, and sperm from 4,000 G(1 )male offspring of ENU-treated C57BL/6J males mated to untreated C57BL/6J females. Each mouse in the CMMB carries a large number of random heterozygous point mutations throughout the genome. High-throughput Temperature Gradient Capillary Electrophoresis (TGCE) was employed to perform a 32-Mbp sequence-driven screen for mutations in 38 PCR amplicons from 11 genes in DNA and/or cDNA from the CMMB mice. DNA sequence analysis of heteroduplex-forming amplicons identified by TGCE revealed 22 mutations in 10 genes for an overall mutation frequency of 1 in 1.45 Mbp. All 22 mutations are single base pair substitutions, and nine of them (41%) result in nonconservative amino acid substitutions. Intracytoplasmic sperm injection (ICSI) of cryopreserved spermatozoa into B6D2F1 or C57BL/6J ova was used to recover mutant mice for nine of the mutations to date. CONCLUSIONS: The inbred C57BL/6J CMMB, together with TGCE mutation screening and ICSI for the recovery of mutant mice, represents a valuable gene-driven approach for the functional annotation of the mammalian genome and for the generation of mouse models of human genetic diseases. The ability of ENU to induce mutations that cause various types of changes in proteins will provide additional insights into the functions of mammalian proteins that may not be detectable by knockout mutations

    TB database: an integrated platform for tuberculosis research

    Get PDF
    The effective control of tuberculosis (TB) has been thwarted by the need for prolonged, complex and potentially toxic drug regimens, by reliance on an inefficient vaccine and by the absence of biomarkers of clinical status. The promise of the genomics era for TB control is substantial, but has been hindered by the lack of a central repository that collects and integrates genomic and experimental data about this organism in a way that can be readily accessed and analyzed. The Tuberculosis Database (TBDB) is an integrated database providing access to TB genomic data and resources, relevant to the discovery and development of TB drugs, vaccines and biomarkers. The current release of TBDB houses genome sequence data and annotations for 28 different Mycobacterium tuberculosis strains and related bacteria. TBDB stores pre- and post-publication gene-expression data from M. tuberculosis and its close relatives. TBDB currently hosts data for nearly 1500 public tuberculosis microarrays and 260 arrays for Streptomyces. In addition, TBDB provides access to a suite of comparative genomics and microarray analysis software. By bringing together M. tuberculosis genome annotation and gene-expression data with a suite of analysis tools, TBDB (http://www.tbdb.org/) provides a unique discovery platform for TB research

    TB database: an integrated platform for tuberculosis research

    Get PDF
    The effective control of tuberculosis (TB) has been thwarted by the need for prolonged, complex and potentially toxic drug regimens, by reliance on an inefficient vaccine and by the absence of biomarkers of clinical status. The promise of the genomics era for TB control is substantial, but has been hindered by the lack of a central repository that collects and integrates genomic and experimental data about this organism in a way that can be readily accessed and analyzed. The Tuberculosis Database (TBDB) is an integrated database providing access to TB genomic data and resources, relevant to the discovery and development of TB drugs, vaccines and biomarkers. The current release of TBDB houses genome sequence data and annotations for 28 different Mycobacterium tuberculosis strains and related bacteria. TBDB stores pre- and post-publication gene-expression data from M. tuberculosis and its close relatives. TBDB currently hosts data for nearly 1500 public tuberculosis microarrays and 260 arrays for Streptomyces. In addition, TBDB provides access to a suite of comparative genomics and microarray analysis software. By bringing together M. tuberculosis genome annotation and gene-expression data with a suite of analysis tools, TBDB (http://www.tbdb.org/) provides a unique discovery platform for TB research

    A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB

    Get PDF
    BACKGROUND: Sharing of microarray data within the research community has been greatly facilitated by the development of the disclosure and communication standards MIAME and MAGE-ML by the MGED Society. However, the complexity of the MAGE-ML format has made its use impractical for laboratories lacking dedicated bioinformatics support. RESULTS: We propose a simple tab-delimited, spreadsheet-based format, MAGE-TAB, which will become a part of the MAGE microarray data standard and can be used for annotating and communicating microarray data in a MIAME compliant fashion. CONCLUSION: MAGE-TAB will enable laboratories without bioinformatics experience or support to manage, exchange and submit well-annotated microarray data in a standard format using a spreadsheet. The MAGE-TAB format is self-contained, and does not require an understanding of MAGE-ML or XML

    Non-Canonicaly Recruited TCRαβCD8αα IELs Recognize Microbial Antigens

    Get PDF
    In the gut, various subsets of intraepithelial T cells (IELs) respond to self or non-self-antigens derived from the body, diet, commensal and pathogenic microbiota. Dominant subset of IELs in the small intestine are TCRαβCD8αα+ cells, which are derived from immature thymocytes that express self-reactive TCRs. Although most of TCRαβCD8αα+ IELs are thymus-derived, their repertoire adapts to microbial flora. Here, using high throughput TCR sequencing we examined how clonal diversity of TCRαβCD8αα+ IELs changes upon exposure to commensal-derived antigens. We found that fraction of CD8αα+ IELs and CD4+ T cells express identical αβTCRs and this overlap raised parallel to a surge in the diversity of microbial flora. We also found that an opportunistic pathogen (Staphylococcus aureus) isolated from mouse small intestine specifically activated CD8αα+ IELs and CD4+ derived T cell hybridomas suggesting that some of TCRαβCD8αα+ clones with microbial specificities have extrathymic origin. We also report that CD8ααCD4+ IELs and Foxp3CD4+ T cells from the small intestine shared many αβTCRs, regardless whether the later subset was isolated from Foxp3CNS1 sufficient or Foxp3CNS1 deficient mice that lacks peripherally-derived Tregs. Overall, our results imply that repertoire of TCRαβCD8αα+ in small intestine expends in situ in response to changes in microbial flora

    MPI-PHYLIP: Parallelizing Computationally Intensive Phylogenetic Analysis Routines for the Analysis of Large Protein Families

    Get PDF
    Background: Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based on the bootstrap and other resampling methods play a crucial part in analyzing the robustness of the trees produced for these analyses. Methodology: Our focus was to increase the number of bootstrap replications that can be performed on large protein datasets using the maximum parsimony, distance matrix, and maximum likelihood methods. We have modified the PHYLIP package using MPI to enable large-scale phylogenetic study of protein sequences, using a statistically robust number of bootstrapped datasets, to be performed in a moderate amount of time. This paper discusses the methodology used to parallelize the PHYLIP programs and reports the performance of the parallel PHYLIP programs that are relevant to the study of protein evolution on several protein datasets. Conclusions: Calculations that currently take a few days on a state of the art desktop workstation are reduced to calculations that can be performed over lunchtime on a modern parallel computer. Of the three protein methods tested, the maximum likelihood method scales the best, followed by the distance method, and then the maximum parsimony method. However, the maximum likelihood method requires significant memory resources, which limits its application to mor
    corecore